Application-aware Adaptive DRAM Bank Partitioning in CMP

نویسندگان

  • Kenji Kise
  • Takakazu Ikeda
چکیده

Main memory is a shared resource among cores in a chip and the speed gap between cores and main memory limits the total system performance. Thus, main memory should be effectively accessed by each core. Exploiting both parallelism and locality of main memory is the key to realize the efficient memory access. The parallelism between memory banks can hide the latency by pipelining memory accesses. The locality of memory accesses improves hit ratio of the row buffer in DRAM chips. The state-of-the-art method called bpart is proposed to improve memory access efficiency. In bpart one bank is monopolized by one thread and this monopolization improves row buffer locality because of alleviating inter-thread interference. However, bpart is not effective for the thread which has poor locality. Moreover, the bank level parallelism is not exploited. I propose the new bank partitioning method which exploits parallelism in addition to locality. The method applies the two types of bank usage. One usage is that low locality threads share banks to improve parallelism, and the other usage is that each high locality thread monopolizes each bank to improve row buffer locality. I evaluate the proposed method by my in-house software simulator with SPEC CPU 2006 benchmark. On average, system throughput is increased by 1.0% and minimum speedup (fairness metrics) is increased by 7.9% relative to bpart. This result shows that my porposed method has better performance and fairness than bpart.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing the Performance and Fairness of Shared DRAM Systems with Parallelism-Aware Batch Scheduling

Enhancing the Performance and Fairness of Shared DRAM Systems with Parallelism-Aware Batch Scheduling Onur Mutlu Thomas Moscibroda Microsoft Research Abstract In a chip-multiprocessor (CMP) system, the DRAM system is shared among cores. In a shared DRAM system, requests from a thread can not only delay requests from other threads by causing bank/bus/row-buffer conflicts but they can also destro...

متن کامل

EASE: Energy-Aware Self-Optimizing DRAM Scheduling

We propose an energy-aware self-optimizing memory controller in which DRAM energy management is integral to the command scheduler. Experiments conducted on a 8-core CMP model show that, for the parallel applications considered, our scheduler reduces memory’s energy-delay squared Et by 18% while delivering a 5% speedup with respect to a state-of-the-art power-aware solution.

متن کامل

Adaptive Zone-Aware Multi-bank on Chip last level L2 Cache Partitioning for Chip Multiprocessors

This paper proposes a novel efficient Non-Uniform Cache Architecture (NUCA) scheme for the Last-Level Cache (LLC) to reduce the average on-chip access latency and improve core isolation in Chip Multiprocessors (CMP). The architecture proposed is expected to improve upon the various NUCA schemes proposed so far such as S-NUCA, D-NUCA and SP-NUCA[9][10][5] in terms of average access latency witho...

متن کامل

DRAM-Aware Last-Level Cache Replacement

The cost of last-level cache misses and evictions depend significantly on three major performance-related characteristics of DRAM-based main memory systems: bank-level parallelism, row buffer locality, and write-caused interference. Bank-level parallelism and row buffer locality introduce different latency costs for the processor to service misses: parallel or serial, fast or slow. Write-caused...

متن کامل

Process Variation Aware Dram ( Dynamic Random Access Memory ) Design Using Block - Based Adaptive Body

Process Variation Aware DRAM (Dynamic Random Access Memory) Design Using Block-Based Adaptive Body Biasing Algorithm

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014